Greedy Approach for Subspace Clustering from Corrupted and 

Incomplete Data 



en : 

o 

Oh 

in ; 



< 



> 

00 

O 
m 



% 



A. Petukhov^, I. Kozlov^ 

^Contact Author, Department of Mathematics, University of Georgia, Athens, GA 30602, USA, 

petukho V @ math, uga.edu 
^Algosoft Tech USA, Bishop, GA, USA, inna@aIgosoft-tech.com 



Abstract — We describe the Greedy Sparse Subspace Clus- 
tering (GSSC) algorithm providing an efficient method for 
clustering data belonging to a few low- dimensional linear or 
affine subspaces from incomplete corrupted and noisy data. 
We provide numerical evidences that, even in the simplest 
implementation, the greedy approach increases the subspace 
clustering capability of the existing state-of-the art SSC 
algorithm significantly. 
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1. Introduction 

We consider a greedy strategy based algorithm for prepro- 
cessing on vector database necessary for subspace clustering. 
The problem of subspace clustering consists in classification 
of the vector data belonging to a few linear or affine low- 
dimensional subspaces of the high dimensional ambient 
space when neither subspaces or even their dimensions are 
not known, i.e., they have to be identified from the same 
database. 

The algorithm considered below does not require clean 
input data. Vice versa, we assume that those data are 
corrupted with sparse errors, random noise distributed over 
all vector entries and quite significant part of data is missed. 

The property of errors to constitute a sparse set means 
that some (but not all) vector entries are corrupted, i.e., 
those values are randomly replaced. The locations (indices) 
of corrupted entries are unknown. 

The noise is randomly introduced in each entry. Its mag- 
nitude is usually much less than the data magnitude. 

In information theory, the missed samples are called 
erasures. They have two main features. First, the data values 
in erasures does not have practical importance. The most 
natural way is to think about erasures as about lost data. 
Second, the coordinates of erasures are known. 

Using more formal definition, we have N vectors {yi}fLi 
in K linear or affine subspaces {S^}^i with the dimensions 
{d/j^j^ of the ZJ-dimensional Euclidean space R^. We 
do not assume that those spaces do not have non-trivial 
intersections. However, we do assume that any one of 
those spaces is not a subspace of other one. At the same 
time, the situation when one subspace is a subspace of 



the direct sum of two other subspaces is allowed. Such 
settings inspire the hope that when N is large enough and 
the points are randomly and independently distributed on 
those planes some sophisticated algorithm can identify those 
planes and classify the belongingness of each point to the 
found subspaces. Of course, some of points may belong to 
the intersection of two or more subspaces, then such point 
maybe assigned to one of those space or to all of them. With 
the probability 1 the points belong to only one of subspaces. 
Then the problem consists in finding a permutation matrix 
r such that 

where Y E W^^^ is an input matrix whose columns are 
the given points in an arbitrary random order, whereas in 
[Yi, . . . , Yk] is rearrangement of the matrix Y in the accor- 
dance with the affiliation of the vector with the subspaces 
S\ 

The problem of finding clusters {Yk] is usually by means 
of finding the clusters in the graph whose edges (to be 
more precise the weights of edges) characterize the intercon- 
nection between pairs of vertexes. In our case, the popular 
method of clustering consists in making the points to play 
role of the vertexes, while the weights are set from the 
coefficients of decomposition of the vectors through other 
vectors from the same space S^ . This idea looks as vicious 
circle. We are trying to identify the space S'^ accommodating 
the vector y^, using its hnear decomposition in the remaining 
vectors of S^ . However, the situation is not hopeless at all. 
In m, the excellent suggestion was formulated. Obviously, 
the decomposition problem formulated above is reduced to 
solving the non-convex problem 

||ci||o -^ min, subject to y., = Yc,, c^a = 0, i = 1, . . . ,n; 

(1) 
where ||x||o is the Hamming weight of the vector x, ||x||o = 
^{xj 7^ 0}. The problem of finding the sparsest solutions to 
([T]), so-called. Compressed Sensing or Compressive Sampling 
underwent thorough study originated in ||3], ||6], IITtI and 
continued in hundreds of theoretical and applied papers. 

In the ideal world of perfect computational precision 
and unlimited computational power, with probability one, 
the solutions of ([T]) would point out the elements of the 
appropriate 5' by their non-zero decomposition coefficients. 



Provided that no vectors of wrong subspaces participate in 
decomposition of each column of Y, the matrix C whose 
columns are c,; allows perfectly reconstruct the structure of 
the subspaces in polynomial time. There are two obstacles 
on that way. 

First, the precision of the input matrix Y is not perfect. 
So the decompositions may pick up wrong vectors even if 
we are able to solve problem ([T). In this case, the prob- 
lem of subspace clustering is considered for the similarity 
graph defined by the symmetric matrix W := \C\ + \C'^\- 
While, generally speaking, this problem has non-polynomial 
complexity, there exist practical algorithms allowing right 
clustering when the number of "false" interconnections of 
elements from different subspaces are not very dense and not 
very intensive. Following [8|, we use some modification of 
the spectral clustering algorithm from llT2l which is specified 
in |TT| as "graph's random walk Laplacian". 

The second obstacle consists in non-polynomial com- 
plexity of the problem ([T]l. The elegant solution allowing 
to overcome this obstacle is replacement of non-convex 
problem ^ with the convex problem 

Ijcji -^ min, subject to y^ = Yci, c.^^i = 0, i = 1, . . . ,n; 

(2) 



II C|l 1 -^ min, subject to r = yC, diag(C) = 0; (3) 

in the matrix form. It follows from the fundamental results 
from f3l, f6l, |T7l that for matrices Y with some reasonable 
restrictions on Y and for not very large Hamming weight of 
the ideal sparse solution, it can be uniquely found by solving 
convex problem ^. There are other more efficient ^^-based 
methods for finding sparse solutions (e.g., see ||4], iflBl . ||9|). 

It should be mentioned, that the matrices Y in practical 
problems may be very far from the requirements for the 
uniqueness of solutions. At the same time, the uniqueness 
of the solutions are not necessary in our settings. We just 
wish to have the maximum of separation between indices of 
the matrix W corresponding to different subspaces. 

In the case of successful clustering, the results for each S'' 
may be used for further processing like data noise removal, 
error correction, and so on. Such procedures become signif- 
icantly more efficient when applied to low-rank submatrices 
of Y corresponding to one subspace. 

Thus, in applications, the problems involving subspace 
clustering can be split into 3 stages: 1) preprocessing; 2) 
search for clusters in the graphs; 3) processing on clusters. 
In this paper, we develop a first stage algorithm helping to 
perform the second stage much more efficiently then the 
state-of-the-art algorithms. We do not discuss any aspects 
of improvements of stage 2. We just take one of such 
algorithms, specifically the spectral clustering, and use it 
for comparison of the influence of our and competing 
preprocessing algorithms on the efficiency of clustering. 



As for stage 3, its content depends on an applied problem 
requesting subspace clustering. One of typical possible goal 
of the third stage can be data recovery from incomplete 
and corrupted measurements. Sometimes this problem is 
called "Netflix problem". We will discuss below how the 
same problems of incompleteness and corruption can be 
solved within clustering preprocessing. However, for low- 
rank matrices it can be solved more efficiently. Among many 
existing algorithms we mention the most recent papers [2|, 
EOl, ins, mi, inn providing the best results for input 
having both erasures and errors. 

In Section |2] we discuss the problem formal settings. In 
Section [3] the existing SSC algorithm and our modification 
will be given. The results of numerical experiments showing 
the consistency of the proposed approach will be given in 
Section g] 

2. Problem Settings 

We use Sparse Subspace Clustering algorithm (SSC) from 
fSl as a basic algorithm for our modification based on a 
greedy approach. Therefore, significant part of reasoning we 
give in this section can be found in |[8l and in the earlier 
paper |T|. Very similar ideas of subspace self -representation 
for subspace clustering were used also in I.16J . However, 
the error resilience mechanism in that paper was used under 
assumption that there are enough uncorrupted data vector, 
whereas, this assumption is not required in [8J. 

Optimization CS problems |2] and [3] assume that the data 
are clean, i.e., they have no noise and errors. Considering 
the problem within the standard CS framework, the problem 
[T] can be reformulated as finding the sparsest vectors c 
(decomposition coefficients) and e satisfying the system of 
linear equations y = Ac + e. 

It was mentioned in ifTSll that the last system can be re- 
written as 

y = [^^] I , (4) 

where / is the identity matrix. Therefore, the problem of 
sparse reconstruction and error correction can be solved 
simultaneously with CS methods. In lfT4ll . we designed an 
algorithm efficiently finding sparse solutions to (|4). 

Unfortunately, the subspace clustering cannot use this 
strategy straightforwardly because not only "measurements" 
Yi are corrupted in ([T) but "measuring matrix" A = Y also 
can be corrupted. It should be mentioned that if the error 
probability is so low that there exist uncorrupted columns 
of Y constituting bases for all subspaces {5'''}, the method 
from I.14J can solve the problem of sparse representation 
with simultaneous error correction. In what follows, the 
considered algorithm will admit a significantly higher error 
rate. In particular, all columns of Y may be corrupted. 

Following 1 8 1, we introduce two (unknown for the algo- 
rithm) D X N matrices E and Z. The matrix E contains a 
sparse (i.e., #{Eij ^ 0} < DN) set of errors with relatively 



large magnitudes. The matrix Z defines the noise having 
relatively low magnitude but distributed over all entries of 
Z. Thus, the clean data are representable as Y — E — Z. 
Therefore, when the data are corrupted with sparse errors 
and noise, the equation Y = YC has to be replaces by 



Y = YC + E{I-C) + Z{I - C). 



(5) 



The authors of |8| applied a reasonable simplification of 
the problem by replacing 2 last terms of (|5) with some 
(unknown) sparse matrix E := E{I—C) and the matrix with 
the deformed noise Z = Z{I — C). Provided that sparse C 
exists, the matrix E still has to be sparse. This transformation 
leads to some simplification of the optimization procedure. 
This is admissible simplification since, generally speaking, 
we do not need to correct and denoise the input data Y. Our 
only goal is to find the sparse matrix C. Therefore, we do 
not need matrices E and Z. While, as we mentioned above, 
the error correction procedure can be applied after subspace 
clustering, original setting (|5]l with the genuine values of the 
errors and noise within subspace clustering still makes sense 
and deserves to become a topic for future research. 

Taking into account modifications from last paragraph, the 
authors of (E\ formulate constrained optimization problem 

A, 



min||C||i + Ae||^||i 



\Z\\ 



s.t. Y = YC + E + Z, diag(C) = 0, 



(6) 



where || • \\p is the Frobenius norm of a matrix. If the 
clustering into affine subspaces is required, the additional 
constrain C'^1 = 1 is added. 

On the next step, using the representation Z = Y — 
YC — E and introducing an auxiliary matrix A E W^'^^ , 
constrained optimization problem (|6]l is transformed into 



min||C|| 



XJEW 



hl\\Y-YA-E\\% 



2 " "" (7) 

s.t. A^l = 1, A = C- diag(C), 

Optimization problems ^ and (|7]i are equivalent. Indeed, 
obviously, at the point of extremum of d?), diag(C) = 0. 
Hence, A^ C. 

At last, the quadratic penalty functions with the weight 
p/2 corresponding to constrains are added to the functional 
in ^ and the Lagrangian functional is composed. The final 
Lagrangian functional is as follows 



£(C,A,^,^,A)=min||C||i 
+K\\E\\, + ^\\Y-YA-E\\% 

+ ^\\A'^l-l\\l + ^\\A-C + dmgiC)m 



(8) 



+<5^(A^1 - 1) 



2' 
tr(A^(yl-C + diag(C))), 



where the vector d and the matrix A are Lagrangian coef- 
ficients. Obviously, since the penalty functions are formed 
from the constrains, they do not change the point and value 
of the minimum. 



3. Algorithm 

For minimization of functional ^ an Alternating Direc- 
tion Method of Multipliers (ADMM, |T|) is used. In fSl, this 
is a crucial part of the entire algorithm which is called the 
Sparse Subspace Clustering algorithm. 

The parameters Ae and X^ in ([8]) are selected in advance. 
They define the compromise between good approximation 
of Y with YC and the high sparsity of C. The general rule 
is to set the larger values of the parameters for the less level 
of the noise or errors. In |i8J, the selection of the parameters 
by formulas 



where ap 



Ae = ae/^J,e 

> 1 and 



fie := mmmax||yj|ji, 



Az = ajfiz, 



^z :=mmmax|yj y^ | 



(9) 



is recommended. 

The initial parameter p = p" 
updated as p :— p^^^ = p'^/i with iterations of SSC 
algorithm. We notice that, adding the penalty terms, we 
do not change the problem. It still has the same minimum. 
However, the appropriate selection of p and p^ accelerates 
the algorithm convergence significantly. 

Each iteration of the algorithm is based on consecutive 
optimization with respect to each of the unknown values 
A, C, E, S, A which are initialized by zeros before the 
algorithm starts. 

Due to appropriate form of functional (|8), optimization 
of each value is simple and computationally efficient. The 
five formulas for updating the unknown values are discussed 
below. 

The matrix A'^'^^ is a solution of the system of linear 
equations 



{X^Y^Y + p'^I + p'=ll'^)A'^+i = X^Y^{Y - E^) 



p''{li^ + C^)~18 



kT 



(10) 



When the data are located on linear subspaces the terms 
11^ and 1^ may be removed (set to 0) from (fTOt . 

While the system ( fTol i has matrices of size N x N, due 
to its special form, the complexity of the algorithm for the 
inverse matrix is 0{D^N) that is much lower than 0{N'^), 
provided that D -^ N. Unfortunately, the matrix A may 
have a full rank. Therefore, the computational cost of its 
product with the inverse matrix is 0{DN^), i.e., not so 
impressive as for the matrix inversion. 

We will need the following notation 



Se[x] :-- 



where x can be either a number or a vector or a matrix. The 
operator Se[-] is called the shrinkage operator. 



x-e, 


X > £, 


X + e, 


X < —e, 


0, 


otherwise; 



Let 

J:=5i[A'^+i+AVp]. 
p 

Then the matrix C'"'+^ is defined by the formula 

C'^+i := J-diag(J). (11) 

The remaining values E^^^, S, and A are computed as 



E 



fc+i 



~S>^[Y ^YA 



fe+ii 



cfe+l jffc I „kf Ak+1- 



:=r +p'^(A'^+^l-l), 



A 



fe + l Afe 



= A'^ + p'=(^'=+i-C''+i). 



(12) 
(13) 
(14) 



The algorithm goes to the next iteration if one of condi- 
tions 



\A 



fc+i 



Ak\ 



<e, 



£,fc+l _ ^fe 



< e 



fails, where e is the given error tolerance. 

In the form shown above SSC algorithm gives the state- 
of-the-art benchmarks for subspace clustering problems. 

Our suggestion is to attract ideas of greedy algorithms 
to increase the capability of SSC algorithm in subspace 
clustering. Greedy algorithms are very popular in non- 
linear approximation (especially in redundant systems) when 
the global optimization is replaced with iterative selection 
of the most probable candidates from the point of view 
their prospective contribution into approximation. The pro- 
cedure is repeated with selection of new entries, considering 
the previously selected entries as reliable with guaranteed 
participation in approximation. The most typical case is 
Orthogonal Greedy Algorithm consisting in selection of the 
approximating entries having the biggest inner products with 
the current approximation residual and follow-up orthogonal 
projection of the approximated object onto the span of 
selected entries. 

In many cases, OGA allows to find the sparsest represen- 
tations if they exist. In ||9l and lfT3l . we applied greedy idea 
in combination with the reweighted ^'^ -minimization to CS 
problem of finding the sparsest solutions of underdetermined 
system. We used the existing i?^ -minimization scheme with 
the the opportunity to reweight entries. When the basic 
algorithm fails, the greedy blocks picks the biggest (the most 
reliable) entries in the decomposition whose magnitudes are 
higher than some threshold. Those entries are considered as 
reliable. Therefore they get the less weight in the (^ norm 
while other entries are competing on next iterations for the 
right to be picked up. 

The similar idea was employed in our recent paper llTSl . 
where the greedy approach was applied to the algorithm 
for completion of low-rank matrices from incomplete highly 
corrupted samples from [.10 J based on Augmented Lagrange 



Multipliers method. The simple greedy modification of the 
matrix completion algorithm from ifTOl gave the boost in the 
algorithm restoration capability. 

Now we discuss details how the greedy approach can be 
incorporated in (to be more precise over) SSC algorithm. 
First of all we introduce a non-negative matrix A e M^ ^ ^ 
whose entries reflect our knowledge about the entries of 
error matrix E. Let us think that the regular entries with no 
(say, side) information have values 1, whereas entries with 
coordinates of presumptive errors are set to small value or 
toO. 

Let us consider the mechanism of the influence of the 
parameter Ae on the output matrix C. Ag sets the balance 
between the higher level of the sparsity of C with the 
more populated error matrix E and less sparse C but less 
populated matrix E. Setting too small Ag makes too many 
"errors" and very sparse C. However, probably, this is not 
what we want. This would mean that sake of C sparsity we 
introduced too large distortion into the input data Y. At the 
same time, if we know for sure or almost for sure that some 
entry of Y with indices {i,j) is corrupted, we loose nothing 
by assigning to this element an individual small weight in 
functional (O. This weight can be much less than Ae or even 
equal to 0. Thus, we have to replace the term ||i?||i with 
||A i?||i, where operation means entry wise product of 
two matrices. In practice, this means that in formula (ITZt we 
will apply different shrinkage threshold for different indices. 
Generally speaking, it makes sense to use all range of non- 
negative real numbers to reflect our knowledge about E. Say, 
highly reliable entries have to be protected from distortion 
by the weight greater than 1 in A. However, in this paper we 
restrict ourself with two-level entries: either 1 (no knowlege) 
or 10^'' (suspicious to be an error). 

When no a priori knowledge is available, we set all entries 
of A to 1 . However, in information theory there is a special 
form of corrupted information which is called erasure. The 
losses of network packets is the most typical reason of 
erasures. Another example of erasure is given by occlusions 
in video models when moving object temporary overlap the 
background or each other Erasures represent rather missing 
than corrupted information. Erased entries like entries with 
errors have to be restored or at least taken into account. The 
only difference of erasures with errors is a priory knowledge 
of their locations. This additional information allows to 
reconstruct the values of erasures in more efficient way than 
errors. So the entries with erasures have to be marked (say, 
by setting the corresponding entries of A to 0) before the 
algorithm starts. 

The entries which are suspicious to be errors dynamically 
extend "the map of marked errors" in A after each iteration 
of the greedy algorithm. 

Thus, in the Greedy Sparse Subspace Clustering (GSSC) 
we organize an external loop over sparsification part of 
SSC algorithm with an additional input matrix A which is 



initialized with ones at regular entries and with some small 
non-negative number k > at the places of erased entries. 

One iteration of our greedy algorithm consists in running 
the modified version of SSC and A and Y updates. Our 
modification of SSC consists of two parts. First, we take 
into account the matrix A while computing E'^^^ by formula 
(fT2] |. Second, we update p on each iteration of greedy 
algorithm. 

After the first iteration, the estimated matrix E is used to 
set the threshold 



T^ = max(ai||y-^| 



oo, Q;2 max median ly, I 

1<J<N ■' 



where < ai, 02 < 1. We use the median estimate to avoid 
too low threshold leading to large error map when there is 
no error in the data. 

Starting from the second greedy iteration, we just update 
threshold T"+^ = ^T", < ^ < 1. The current value T"+^ 
is used for the extension of the error map by formula 

A!^+i 

K, 






> T" 



where E"^^ is the error matrix obtained on the previous 
iteration. The updated A is used for next iteration. 

In addition, on each iteration of greedy algorithm, we 
update the input matrix of SSC algorithm by the formula 
Y-"^^ — Fj" — E^j for the pairs {i,j) marked in the current 
A" as errors/erasures. While i?" is not a genuine matrix 
of errors, this is not serious drawback for the original SSC 
algorithm. However, for GSSC this may lead to unjustified 
and very undesirable extension of the set A. More accurate 
estimate of the error set in future algorithms may bring 
significant benefits for GSSC. 

As we mentioned, erasures recovery is easier than error 
correction. Putting new entries into the map of errors, we, 
in fact, announce them erasures. If we really have solid 
justification for this action, then the new iteration of GSSC 
can be considered as a new problem which is easier for SSC 
than the previous iteration. 

We will come back to the discussion about interconnection 
between erasures and errors after presentation of numerical 
experiments. 

4. Numerical Experiments 

We will present the comparison of GSSC and SSC on 
synthetic data. The input data was composed in accordance 
with the model given in |8|. 105 data vectors of dimension 
D = 50 are equally split between three 4-dimensional linear 
spaces {S^}^^i. To make the problem more complicated 
each of those 3 spaces belongs to sum of two others. The 
smallest angles between spaces 5' and S^ are defined by 
formulas 



We construct the data sets using vectors generated by decom- 
positions with random coefficients in orthonormal bases ef 
of spaces S^ . Three vectors ej belongs to the same 2D-plane 
with angles eje^ = eje^ = 9 and eje'j = 26. The vectors 



-3; 64,64 

,1 „2 „3. ^3 



^ are mutually orthogonal and orthogonal 

(e] + e2)/^/2, j = 2, 3, 4. The generator 



T 
U V 



cosf 



uG5',vG5J ||u||2||v||2 



-, hj = 1,2,3. 



C2, C2, eg, 

toeJ,e?,e5'; e^ 
of standard normal distribution is used to generate data 
decomposition coefficients. After the generation, a random 
unitary matrix is applied to the result to avoid zeros in some 
regions of the matrix Y. 

We use the notation P^^rs and P^rr for probabilities of 
erasures and errors correspondingly. 

When we generate erasures we set random entries of 
the matrix Y with probability Pers to zero and set those 
elements of A to k = lO^"'. 

The coordinates of samples with errors are generated 
randomly with probability P^rr- We use the additive model 
of errors, adding values of errors to the correct entries of Y. 
The magnitudes of errors are taken from standard normal 
distribution. 

We run 20 trials of GSSC and SSC algorithms for each 

combination of {9, Perr, Pers), 

0<9< 60°, 

< Perr < 0.26, 
< Pers < 0.4, 

and output average values of misclassification. We note that 
for the angle 9 = the spaces {5'} have a common line and 
dim(©f^45') = 7. Nevertheless, we will see that SSC and 
especially GSSC shows high capability even for this hard 
settings. 

Now we describe the algorithm parameters. We do not use 
any creative stop criterion for greedy iterations. We set just 
make 5 iterations in each of 20 trials for all combinations 
{9, Perr, Pers)- The Set A is updated after each iteration as 
described above. 

The parameters for greedy envelop loop are: ai = 0.4, 
a2 = 0.5, P = 0.65, 

The input parameters of the basic SSC block are as 
follows. We set ag = 5, a^ = 50, p° = 10, /i = 1.05, 
e = 0.001. The results presented on Fig. 1 confirms that 
GSSC has much higher error resilience than SSC. For all 
models of input data and for both algorithms "the phase 
transition curve" is in fact straight line with the slope about 
0.4. In particular, this means that the influence on clustering 
of one error is approximately corresponds to the influence of 
2.5 erasures. We can see that GSSC gives reliable clustering 

when Perr + OAPers < 0.17 for 9 = 60° and Perr + 

OAPers < 0.12 for 9 = 6°. For the case 9 = 0°, clustering 
cannot be absolutely perfect even for GSSC. Indeed, the 
clustering for Perr > and Pers > cannot be better than 
for error free model. At the same time, GSSC was designed 
for better error handling. In the error free case, GSSC has no 



advantage over SSC algorithm. Thus, the images on Fig. 1 
for 9 = have the gray background of the approximate 
level 0.042 equal to the rate of misclassification of SSC. 
We believe that the reason of the misclassification lies in 
the method how we define the success. For = 0°, there is 
a common line belonging to all spaces 5' . For points close 
to that line, the considered algorithm has to make a hard 
decision, appointing only one cluster for each such point. 

SSC, 8=0°, no noise GSSC, 9=0°, no noise 
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Fig.l. Clustering on noise free input. 

Probably, for most of applied problems, the information 
about multiple point accommodation is more useful than 
unique space selection. We advocate for such multiple selec- 
tion because the typical follow-up problem after clustering is 
correction of errors in each of clusters. For this problem, it is 
not important to which of clusters the vector belonged from 
the beginning. When the vector affiliation is really important, 
side information has to be attracted. 

The second part of experiment deals with noisy data 
processing. We apply to the matrix Y independent Gaussian 
noise of magnitude 10% of mean square value of the data 
matrix Y, i.e., the noise level is -20dB. On Fig. 2, we present 
the results of processing of the noisy input analogous to 
results on Fig. 1. Evidently, that this quite strong noise has 
minor influence on the clustering efficiency. 

If we increase the noise up to -15 dB, the algorithms 
still resist. For -10 dB (see Fig. 3) GSSC looses a lot but 
still outperforms SSC. Those losses are obviously caused 
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by the increase of the noise fraction in the mixture errors- 
erasures-noise, while the greedy idea efficiently works for 
highly localized corruption like errors and erasures. 

We emphasize that all results on Figs. 1-3 were obtained 
with the same algorithm parameters. 
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GSSC, 8=60°, noise -20 dB 
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Fig.l. Clustering on 


noisy input, SNR=20 dB. 
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Fig.3. Clustering on noisy input, SNR=10 dB. 

In conclusion, we demonstrate the role and dynamic of 
the greedy iterations. In Tab. [1] we give the dependence of 
the rate of misclassification for P^rr = 0.05, P^rs — 0.15, 
SNR=20 dB from the number of GSSC iterations. The value 
lter=0 corresponds to the pure SSC algorithm. The result was 
obtained as the average value of 100 trials. 

Table 1 : Misclassification Rate and Number of Iterations 



Iter 





1 


2 


3 


4 


5 


6 


61 = 0° 
e = 60° 


0.510 
0.465 


0.479 
0.389 


0.351 
0.237 


0.199 
0.053 


0.106 
0.022 


0.053 
0.012 


0.067 
0.010 



5. Conclusions and Future Work 

We consider a modification of Sparse Subspace Cluster- 
ing algorithm based on greedy approach giving significant 
improvement of the algorithm resilience to (simultaneous) 
entry corruption, incompleteness, and noise. 

While the basic SSC algorithm has some internal re- 
silience to corruption, it reduces error and erasure influence 
on clustering quality, strictly speaking, it does not have error 
correction capabilities. 

Adding the error correction capability may have indepen- 
dent importance as well as it may improve the clustering 
quality. This direction deserves the further research. 

In the described version, GSSC has 5-6 iterations of the 
algorithm SSC, having proportional increase of computing 
time. We believe that this computing time increase can 
be significantly eliminated with preserving the algorithm 
efficiency if updates of A are incorporated into internal 
iterations of SSC algorithm. This option was implemented 
by us in very analogous situation for acceleration of the £^- 
greedy algorithm in |[T3l . Finding an appropriate stopping 
criterion also could reduce computing time and improve 
clustering. 

One more reserve for algorithm improvement is selection 
of the parameters adaptive to input data. While, as we 
mentioned all results were obtained with the same set of 
parameters, the adaptation may bring significant increase of 
algorithm capability. One of such adaptive solution for error 
correction in Compressed Sensing was recently found by the 
authors in lfT4l . 
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